Skip to content

fix: add eval-before-train to train_async.py (parity with train.py)#1906

Open
Taosheng-ty wants to merge 2 commits into
THUDM:mainfrom
Taosheng-ty:feat/eval-before-train-async
Open

fix: add eval-before-train to train_async.py (parity with train.py)#1906
Taosheng-ty wants to merge 2 commits into
THUDM:mainfrom
Taosheng-ty:feat/eval-before-train-async

Conversation

@Taosheng-ty
Copy link
Copy Markdown

Summary

  • train.py evaluates the model before training starts (baseline metric), but train_async.py was missing this
  • Add the same 3-line check to train_async.py, placed after update_weights() and before the first generate.remote()

Condition (matches train.py exactly)

if args.eval_interval is not None and args.start_rollout_id == 0 and not args.skip_eval_before_train:
    ray.get(rollout_manager.eval.remote(args.start_rollout_id))
  • Only fires when --eval-interval is set
  • Only on fresh starts (start_rollout_id == 0), not on resume from checkpoint
  • Skippable via --skip-eval-before-train

Placement

actor_model.update_weights()         ← sglang gets initial weights
check_weight_update_equal            ← verify weights
>>> eval before training <<<         ← NEW: baseline eval with initial weights
rollout_data_next_future = ...       ← first rollout generation starts

Test plan

  • Verified --skip-eval-before-train arg exists in arguments.py (store_true, default=False)
  • Verified condition matches train.py line 68
  • Verified eval runs after update_weights() so sglang has correct weights
  • Verified skipped on resume (start_rollout_id > 0)

🤖 Generated with Claude Code

tttaosheng and others added 2 commits May 13, 2026 01:46
For multi-turn agent rollouts where tool-result tokens dominate the
response (often >90%), computing log-probs and entropy for all positions
wastes memory and compute — those masked positions contribute zeros to
the loss anyway.

This adds a loss_masks parameter to get_log_probs_and_entropy. When
provided (and cp_size == 1), only positions where mask == 1 go through
the expensive vocab-parallel softmax. Outputs are padded back to the
original response length with zeros so all downstream code (advantages,
sum_of_sample_mean, etc.) works unchanged.

Typical savings for agentic workloads:
  - 97% masked tokens → ~30x reduction in softmax compute
  - Prevents OOM on long multi-turn samples with large tool outputs
  - Communication in vocab-parallel all-reduces drops proportionally

Limitations:
  - Only active when cp_size == 1 (falls through to unfiltered path
    for context parallelism > 1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
train.py runs an evaluation step before the first training rollout when
--eval-interval is set and --skip-eval-before-train is not passed. This
provides a baseline metric for comparison. train_async.py was missing
this, so users had no pre-training eval checkpoint to compare against.

Add the same check, placed after update_weights() (so sglang has the
correct initial weights) and before the first generate.remote() call.
Only fires on fresh starts (start_rollout_id == 0), not on resume.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants